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AMENDMENTS 



In the specification 

£lease amend the paragraph starting at page 46, line 17, as follows: 



A "comparison window", as used herein, includes reference to a segment of any one of the 
number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 
to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference 
sequence of the same number of contiguous positions after the two sequences are optimally aligned. 
Methods of alignment of sequence for comparison are well-known in the art. Optimal alignment of 
sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & 
Waterman, Adv. Appl. Math. 2:482, 1981, by the homology alignment algorithm of Needleman & 
Wunsch, J. Mol. Biol. 48:443, 1970, by the search for similarity method of person & Lipman, Proc. 
Natl Acad. Sci. USA 85:2444, 1988, by computerized implementations of these algorithms (GAP, 
BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer 
Group, 575 Science Dr., Madison, WI), or by manual alignment and visual inspection. Other algorithms 
for determining homology or identity include, for example, in addition to a BLAST program (Basic 
Local Alignment Search Tool at the National Center for Biological Information), ALIGN, AMAS 
(Analysis of Multiply Aligned Sequences), AMPS (Protein Multiple Sequence Alignment), ASSET 
(Aligned Segment Statistical Evaluation Tool), BANDS, BESTSCOR, BIOSCAN (Biological 
Sequence Comparative Analysis Node), BLIMPS (BLocks IMProved Searcher), FASTA, Intervals & 
Points, BMB, CLUSTAL V, CLUSTAL W, CONSENSUS, LCONSENSUS, WCONSENSUS, 
Smith- Waterman algorithm, DARWIN, Las Vegas algorithm, FNAT (Forced Nucleotide Alignment 
Tool), Framealign, Framesearch, DYNAMIC, FILTER, FSAP (Fristensky Sequence Analysis 
Package), GAP (Global Alignment Program), GENAL, GIBBS, GenQuest, ISSC (Sensitive Sequence 
Comparison), LALIGN (Local Sequence Alignment), LCP (Local Content Program), MACAW 
(Multiple Alignment Construction & Analysis Workbench), MAP (Multiple Alignment Program), 
MBLKP, MBLKN, PIMA (Pattern-Induced Multi-sequence Alignment), SAGA (Sequence 
Alignment by Genetic Algorithm) and WHAT-IF. Such alignment programs can also be used to 
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screen genome databases to identify polynucleotide sequences having substantially identical 
sequences. A number of genome databases are available, for example, a substantial portion of the 
human genome is available as part of the Human Genome Sequencing Project (J. Roach, 
http://w e b e r.u. Washington. e du/ - roach/human_ g e nom e _ progr e ss 2.html) (Gibbs, 1995). At least 
twenty-one other genomes have already been sequenced, including, for example, M. genitalium (Fraser 
et al, 1995), M. jannaschii (Bult et al, 1996), H. influenzae (Fleischmann et al, 1995), E. coli (Blattner 
et al, 1997), and yeast (5. cerevisiae) (Mewes et al, 1997), andD. melanogaster (Adams et al, 2000). 
Significant progress has also been made in sequencing the genomes of model organism, such as mouse, 
C. elegans, and Arabadopsis sp. Several databases containing genomic information annotated with 
some functional information are maintained by different organization, and are accessible via the internet, 
for example, see the databases available on the world wide web at the web sites for The Institute for 
Genomic Research, The University of Wisconsin, at Stanford University, The National Center for 
Biotechnology Information fNCBD, The European Bioinforatics Institute, The Institute Pasteur, The 
Whitehead Institute of the Massachusetts Institute of Technology, and the HIV Databases at the Los 
Alamos National Laboratory web site, http://wwwtigr.org/tdb; http://www.gen e tics.wisc. e du; 
http://g e nom e www.stanford. e du/ - ball; http://hiv w e b.lanl.gov; http://www.ncbi.nlm.nih.gov; 
http://www.ebi.ac.ulc: http://Past e ur.fr/oth e ^iology - and http://www:g e nom e .wi.mit. e du . 



Please amend the paragraph starting at page 47, line 29, as follows: 

Une example ot a usetul algorithm is BLAST and BLAST 2.0 algorithms ; which are — — 

described in Altschul et al, Nuc. Acids Res. 25:3389-3402, 1977, and Altschul et al, J. Mol. Biol. 
215:403-410, 1990, respectively. Software for performing BLAST analyses is publicly available 
through the National Center for Biotechnology Information world wide web site 
(http://www.ncbi.nlm.nih.gov) . This algorithm involves first identifying high scoring sequence 
pairs (HSPs) by identifying short words of length W in the query sequence, which either match or 
satisfy some positive-valued threshold score T when aligned with a word of the same length in a 
database sequence. T is referred to as the neighborhood word score threshold (Altschul et al, 
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supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs 
containing them. The word hits are extended in both directions along each sequence for as far as the 
cumulative alignment score can be increased. Cumulative scores are calculated using, for 
nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0). 
For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of 
the word hits in each direction are halted when: the cumulative alignment score falls off by the 
quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to 
the accumulation of one or more negative-scoring residue alignments; or the end of either sequence 
is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of 
the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a word length 
(W) of 1 1, an expectation (E) of 10, M=5, N=-4 and a comparison of both strands. For amino acid 
sequences, the BLASTP program uses as defaults a word length of 3, and expectations (E) of 10, 
and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 
89:10915, 1989) alignments (B) of 50, expectation (E) of 10, M=5, N= -4, and a comparison of both 
strands. 



Please amend the paragraph starting at page 49, line 12, as follows: 



The BLAST programs identify homologous sequences by identifying similar segments, 
which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic 
acid sequence and a test sequence which is preferably obtained from a protein or nucleic acid 
sequence database. High-scoring segment pairs are preferably identified (i.e. 9 aligned) by means 
of a scoring matrix, many of which are known in the art. Preferably, the scoring matrix used is 
the BLOSUM62 matrix (Gonnet et al., Science 256: 1443-1445, 1992; Henikoff and Henikoff, 
Proteins 17:49-61, 1993). Less preferably, the PAM or PAM250 matrices may also be used (see, 
e.g., Schwartz and Dayhoff, eds., 1978, Matrices for Detecting Distance Relationships: Atlas of 
Protein Sequence and Structure, Washington: National Biomedical Research Foundation). 
BLAST programs are accessible through the U.S. National Library of Medicine, e.g., at The 
National Center for Biotechnology Information web site-w ww.ncbi.nlm.nih.gov . 
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Please amend the paragra ptLSlM ing at page 49, line 28, as follows: 

Figure 13 is a flow diagram illustrating one embodiment of an identifier process 300 for 
detecting the presence of a feature in a sequence. The process 300 begins at a start state 302 and 
then moves to a state 304 wherein a first sequence that is to be checked for features is stored to a 
memory 1 15 in the computer system 100. The process 300 then moves to a state 306 wherein a 
database of sequence features is opened. Such a database would include a list of each feature's 
attributes along with the name of the feature. For example, a feature name could be "Initiation 
Codon" and the attribute would be "ATG" Another example would be the feature name 
"TAATAA Box" and the feature attribute would be "TAATAA". An example of such a 
database is produced by the University of Wisconsin Genetics Computer Group (GCG) and 
available at the GCG world wide web site (www.gcg.com) . Alternatively, the features may be 
structural polypeptide motifs such as alpha helices, beta sheets, or functional polypeptide motifs 
such as enzymatic active sites, helix-turn-helix motifs or other motifs known to those skilled in 
the art. 
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